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Median 


The median is the value of the rank, which divides the arranged series into two 
equal numbers. It is independent of the actual value. Arranging the data in 
ascending or descending order and then finding the value of the middle ranking 
number is the most significant in calculating the median. In case of the even 
numbers the average of the two middle ranking values will be the median. 


Mode 


Mode is the maximum occurrence or frequency at a particular point or value. 
You may notice that each one of these measures is a different method of determining 
a single representative number suited to different types of the data sets. 


Mean 


Mean is the simple arithmetic average of the different values of a variable. For 
ungrouped and grouped data, the methods for calculating mean are necessarily 
different. Mean can be calculated by direct or indirect methods, for both grouped 
and ungrouped data. 


Computing Mean from Ungrouped Data 


Direct Method 


While calculating mean from ungrouped data using the direct method, the values 
for each observation are added and the total number of occurrences are divided 
by the sum of all observations. The mean is calculated using the following formula: 


x- dae 
N 


Where, 
x = Mean Table 2.1 : Calculation of Mean Rainfall 
2, = Sum of a series of Districts in Normal Rainfall 
Malwa Plateau in mms Indirect Method 
measures 
X = A raw score in a 
series of measures Indore 
Dewas 
3o x =Thesumofallthe | phar 
measures Ratlam 
N =Number of ie 
Mandsaur 
measures Shajapur 


Example 2.1 : Calculate 
the mean rainfall for 
Malwa Plateau in Madhya 
Pradesh from the rainfall 


of the districts of the * Where 800 is assumed mean. 
region given in Table 2.1: dis deviation from the assumed mean. 
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x->* 
N 


6,484 


7 
} 
— 926.29 A Hem- 


The mean for the data given in Table 2.1 is computed as under: a ‘ea 
4 á | 
A 


It could be noted from the computation of the mean that the raw rainfall data L f —_ 
have been added directly and the sum is divided by the number of observations ™“~ 
i. e., districts. Therefore, it is known as direct method. f” fa LE 
on 
Indirect Method a 
For a large number of observations, the indirect method is normally used to > LX o, “Uih 
compute the mean. It helps in reducing the values of the observations to smaller b Jo 
numbers by subtracting a constant value from them. For example, as shown in 1 a p 
Table 2.1, the rainfall values lie between 800 and 1100 mm. We can reduce ) -i 
these values by selecting ‘assumed mean’ and subtracting the chosen number y ‘ 
from each value. In the present case, we have taken 800 as assumed mean. Such e i ; 
an operation is known as coding. The mean is then worked out from these reduced E a LAA i 
numbers (Column 3 of Table 2. 1). N 9 pi SE. 
The following formula is used in computing the mean using indirect \ À A in 
method: YA \ 7 N 
z- ali SS AA 
X = A -+ = oa P 
A F f / s 
Where, Law se 
A = Subtracted constant a “4 zd FA F 
= Sum of the coded scores i E 


Sid 
N = Number of individual observations in a series 
Mean for the data as shown in Table 2.1 can be computed using the indirect 
method in the following manner : 


X = 800 + <= 


ZUISSI20 
` 


= 800+ 884 
T 





X = 926.29 mm be 
Note that the mean value comes the same when computed either of the two __ fi 
methods. f>: a 
K) 
= 


: y | 

AM et 
The mean is also computed for the grouped data using either direct or indirect 7 4 / 
method. A77 ot 


th `, 
Direct Method i iG S 
4 
f 


Computing Mean from Grouped Data 


When scores are grouped into a frequency distribution, the individual values el NE, A- 
lose their identity. These values are represented by the midpoints of the class mi tar’ E- ‘7 
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intervals in which they are located. While computing the mean from grouped 
data using direct method, the midpoint of each class interval is multiplied with 
its corresponding frequency (f); all values of fx (the X are the midpoints) are 
added to obtain ` Jx that is finally divided by the number of observations i. e., 
N. Hence, mean is calculated using the following formula : 


x- Se 
N 
Where 
x = Mean 
f = Frequencies 
x =  Midpoints of class intervals 
i = 


= Number of observations (it may also be defined as ` f) 


Example 2.2 : Compute the average wage rate of factory workers using data 
given in Table 2.2: 


Table 2.2 : Wage Rate of Factory Workers 


Wage Rate (Rs./day) Number of workers (f) 


Classes Frequency Mid- U = 
(x-100)/ 
20 





Where N= X f =99 


Table 2.3 provides the procedure for calculating the mean for grouped data. 
In the given frequency distribution, ninety-nine workers have been grouped into 
five classes of wage rates. The midpoints of these groups are listed in the third 
column. To find the mean, each midpoint (X) has been multiplied by the frequency 
(f) and their sum (` fx) divided by N. 
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The mean may be computed as under using the given formula : 


x=- US 
N 


10,160 
99 





= 102.6 


Indirect Method 


The following formula can be used for the indirect method for grouped data. The 
principles of this formula are similar to that of the indirect method given for 
ungrouped data. It is expressed as under 


Where, 


2 as 


i 


X 


a az XS 
N 


A = Midpoint of the assumed mean group 


(The assumed mean group in Table 2.3 is 90 — 110 with 100 as 
midpoint.) 

= Frequency 

= Deviation from the assumed mean group (A) 

= Sum ofcasesor Xf 


= Interval width (in this case, it is 20) 


From Table 2.3 the following steps involved in computing mean using the 
direct method can be deduced : 


(i) 


Mean has been assumed in the group of 90 - 110. It is preferably 
assumed from the class as near to the middle of the series as possible. 
This procedure minimises the magnitude of computation. In Table 2.3, 
A (assumed mean) is 100, the midpoint of the class 90 — 110. 

The fifth column (u) lists the deviations of midpoint of each class from 
the midpoint of the assumed mean group (90 — 110). 

The sixth column shows the multiplied values of each f by its 
corresponding d to give fd. Then, positive and negative values of fdare 
added separately and their absolute difference is found > fd). Note 
that the sign attached to ` fd is replaced in the formula following A, 
where + is given. 


The mean using indirect method is computed as under : 


Note: 


intervals. 


x =A+t a 
N 
260 
=i eee = 
00 + p 


= 100 + 2.6 wo 


= 102.6 


The Indirect mean method will work for both equal and unequal class 
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Median 


Median is a positional average. It may be defined “as the point in a distribution 
with an equal number of cases on each side of it”. The Median is expressed 
using symbol M. 


Computing Median for Ungrouped Data 


When the scores are ungrouped, these are arranged in ascending or descending 
order. Median can be found by locating the central observation or value in the 
arranged series. The central value may be located from either end of the series 
arranged in ascending or descending order. The following equation is used to 
compute the median : 


N+1 


Value of aa th item 





Example 2.3: Calculate median height of mountain peaks in parts of the 
Himalayas using the following: 


8,126 m, 8,611m, 7,817 m, 8,172 m, 8,076 m, 8,848 m, 8,598 m. 


Computation : Median (M) may be calculated in the following steps : 
(i) Arrange the given data in ascending or descending order. 
(ii) Apply the formula for locating the central value in the series. Thus : 


Value of ( ~— y item 


14l 


th item 








= pi item 
2, 


4th item in the arranged series will be the Median. 


Arrangement of data in ascending order — 


7,817; 8,076; 8,126; 8,172; 8,598; 8,611; 8,848 


Ath item 
Hence, 
M = 8,172 m 


Computing Median for Grouped Data 


When the scores are grouped, we have to find the value of the point where an 
individual or observation is centrally located in the group. It can be computed 
using the following formula : 
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Where, 


SD a eee th Tk 


Example 2.4 : Calculate the median for the following distribution : 


‘class | 50-60 | 60-70 | 70-80 | 80-90 | 90-100 | 100-110 





50-60 
60-70 
70-80 


80-90 
(median group) 
90-100 


100-110 


The median is computed in the steps given below : — 

(i) The frequency table is set up as in Table 2.4. w a 
(ii) Cumulative frequencies (F) are obtained by adding each normal „A e z 
frequency of the successive interval groups, as given in column 3 of — ~ ys, 2 
Table 2.4. ee 

N 50 g 

$ ‘ n 

(iii) Median number is obtained by 5 i.e. — = 25 in this case, as shown in E 


(iv) 


(v) 





= Median for grouped data 
= Lower limit of the median class 
= Interval 
= Frequency of the median class 
Total number of frequencies or number of observations 
= Cumulative frequency of the pre-median class. 


Table 2.4 : Computation of Median 


Frequency Cumulative Calculation 
ff) Frequency (F) of Median Class 










s 
2 
column 4 of Table 2.4. Pa 


Count into the cumulative frequency distribution (F) from the top ` > LA 
towards bottom until the value next greater than > is reached. In this s A 
example, = is 25, which falls in the Class interval of 40-44 wi ha fi 
cumulative frequency of 37, thus the cumulative frequency of ne f 


median class is 21 and actual frequency of the median class is 16." —_— 


The median is then computed by substituting all the values determined 
in the step 4 in the following equation : wT p 


i 
M = l+ —(m- c) 
T C 
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= 80 + 10 (25 - 21) 
16 


= 80+2x4 
8 


80 + 2 
2 


80 + 2.5 
M = 82.5 


Mode 


The value that occurs most frequently in a distribution is referred to as mode. It 
is symbolised as Z or M,. Mode is a measure that is less widely used compared 
to mean and median. There can be more than one type mode in a given data set. 


Computing Mode for Ungrouped Data 


While computing mode from the given data sets all measures are first arranged 
in ascending or descending order. It helps in identifying the most frequently 
occurring measure easily. 


Example 2.5 : Calculate mode for the following test scores in geography for ten 
students : 
61, 10, 88, 37, 61, 72, 55, 61, 46, 22 
Computation : To find the mode the measures are arranged in ascending order 
as given below: 
10, 22, 37, 46, 55, 61, 61, 61, 72, 88. 


The measure 61 occurring three times in the series is the mode in the given 
dataset. As no other number is in the similar way in the dataset, it possesses the 
property of being unimodal. 


Example 2.6 : Calculate the mode using a different sample of ten other students, 
who scored: 

82, 11, 57, 82, 06, 11, 82, 95, 41, 11. 
Computation : Arrange the given measures in an ascending order as shown 
below : 

08, 11, 11, 11, 41, 57, 82, 82, 82, 95 

It can easily be observed that measures of 11 and 82 both are occurring 

three times in the distribution. The dataset, therefore, is bimodal in appearance. 
If three values have equal and highest frequency, the series is trimodal. Similarly, 
a recurrence of many measures in a series makes it multimodal. However, when 
there is no measure being repeated in a series it is designated as without mode. 


Comparison of Mean, Median and Mode 


The three measures of the central tendency could easily be compared with the 
help of normal distribution curve. The normal curve refers to a frequency 
distribution in which the graph of scores often called a bell-shaped curve. Many 
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human traits such as intelligence, personality scores and student achievements 
have normal distributions. The bell-shaped curve looks the way it does, as it is 
symmetrical. In other words, most of the observations lie on and around the 
middle value. As one approaches the extreme values, the number of observations 
reduces in a symmetrical manner. A normal curve can have high or low data 
variability. An example of a normal distribution curve is given in Fig. 2.3. 


200 


D 
5 
= 
Ü 
2 
O 


Scores 





Fig. 2.3 : Normal Distribution Curve 


The normal distribution has an important characteristic. The mean, median 
and mode are the same score (a score of 100 in Fig. 2.3) because a normal 
distribution is symmetrical. The score with the highest frequency occurs in the 
middle of the distribution and exactly half of the scores occur above the middle 
and half of the scores occur below. Most of the scores occur around the middle of 
the distribution or the mean. Very high and very low scores do not occur frequently 
and are, therefore, considered rare. 

If the data are skewed or distorted in some way, the mean, median and mode 
will not coincide and the effect of the skewed data needs to be considered (Fig. 
2.4 and 2.5). 


Positive Skew 


Frequency 


10 
Scores 


Fig. 2.4 : Positive Skew 
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Negative Skew 





Scores 


Fig. 2.5 : Negative Skew 


Measures of Dispersion 


The measures of Central tendency alone do not adequately describe a distribution 
as they simply locate the centre of a distribution and do not tell us anything 
about how the scores or measurements are scattered in relation to the centre. Let 
us use the data given in Table 2.5 and 2.6 to understand the limitations of the 
measures of central tendency. 


Table 2.5 : Scores of Table 2.6 : Scores of 
Individuals Individuals 


Individual Score 


Individual Score 


X1 
X2 


X1 
X2 
X3 
X4 
X5 


X3 
X4 
X5 





X = 50 for both the distributions 


It can be observed that the mean derived from the two data sets (Table 2.5 and 
2.6) is same i. e. 50. The highest and the lowest score shown in Table 2.5is 55 
and 45 respectively. The distribution in Table 2.6 has a high score of 98 and a 
low score of zero. The range of the first distribution is 10, whereas, it is 98 in the 
second distribution. Although, the mean for both the groups is the same, the 
first group is obviously stable or homogeneous as compared to the distribution 
of score of the second group, which is highly unstable or heterogeneous. This 
raises a question whether the mean is a sufficient indicator of the total character 
of distributions. The examples provide profound evidence that itis not so. Thus, 
to get a better picture of a distribution, we need to use a measure of central 
tendency and of dispersion or variability. 

The term dispersion refers to the scattering of scores about the measure of 
central tendency. It is used to measure the extent to which individual items or 
numerical data tend to vary or spread about an average value. Thus, the 
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dispersion is the degree of spread or scatter or variation of measures about a 
central value. 
The dispersion serves the following two basic purposes : 
(i) It gives us the nature of composition of a series or distribution, and 
(ii) It permits comparison of the given distributions in terms of stability or 
homogeneity. 


Methods of Measuring Dispersion 


The following methods are used as measures of dispersion : 

1. Range 

2. Quartile Deviation 

3. Mean Deviation 

4. Standard Deviation and Coefficient of Variation (CV) 

5. Lorenz Curve 

Each of these methods has definite advantages as well as limitations. Hence, 

there is a need to use either of the methods with great precautions. The Standard 
Deviation (s) as an absolute measure of dispersion and Coefficient of Variation 
(CV) as a relative measure of dispersion, besides the Range are most commonly 


used measures of dispersion. We will discuss how each one of these measures 
is computed. 


Range 


Range (R) is the difference between maximum and minimum values in a series of 
distribution. This way it simply represents the distance from the smallest to the 
largest score in a series. It can also be defined as the highest score minus the 
lowest score. 


Range for Ungrouped Data 





Example 2.7 : Calculate the range for the following distribution of daily wages: ~** 
Rs. 40, 42, 45, 48, 50, 52, 55, 58, 60, 100. ay 


Q 
$ © 
$ ) x h, A P 
Computation of Range s E p 1 
The R can be calculated with the help of the following formula : i A ` 
R=L-S ; P ' 
Where M- | 
‘R’ is Range, 4 | 
T and ‘S is the largest and smallest values respectively in a series. in fii TS 
Hence, Q PP 5 L 
R = LS OF i j| 
= 100-40 = 60 ki a Ad 
If we eliminate the 10th case, R becomes 20 (60 — 40). The elimination of one 3 ` 
score has reduced the R to just one-third. It is obvious that the difficulty R u) IA k / 
as a measure of variability is that its value is wholly dependent upon the two =% th 
extreme scores. Thus, as a measure of dispersion R functions much the same a Co y yÍ 
way as mode does as a measure of central tendency. Both the measures are "Era Ye ‘© J 
highly unstable. } / 
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Standard Deviation 


Standard deviation (SD) is the most widely used measure of dispersion. It is 
defined as the square root of the average of squares of deviations. It is always 
calculated around the mean. The standard deviation is the most stable measure 
of variability and is used in so many other statistical operations. The Greek 
character o denotes it. 

To obtain SD, deviation of each score from the mean (x) is first squared (x?). It 
is important to note that this step makes all negative signs of deviations positive. 
It saves SD from the major criticism of mean deviation which uses modulus x. 
Then, all of the squared deviations are summed - x? (care should be taken that 
these are not summed first and then squared). This sum of the squared deviations 
( x?) is divided by the number of cases and then the square root is taken. Therefore, 
Standard Deviation is defined as the root mean square deviation. For a 
given data set, it is computed using the following formula : 


During these steps, we come across a term before taking its square root. It is 
assigned a special name, the variance. The variance is widely used in advanced 
statistical operations. Its square root is standard deviation. That way, the opposite 
is also true i.e. Square of SD is variance. 


Standard Deviation for Ungrouped Data 


Example 2.8 : Calculate the standard 
deviation for the following scores : Table 2.7 : Computation of 
Standard Deviation 


01, 03, 05, 07, 09 





— J8 =2.828 J8 2.828 
2.83 





Let us summarise the steps used in the 
above computation : 


(i) All the scores have been placed in the column marked X. 
(ii) Summing the raw scores and dividing by N have found mean. 

(iii) Deviation of each raw score (x) has been obtained by subtracting the 
mean from them. A check on our work is that the sum of the x should 
be zero. We find that this is true for our exercise. 

(iv) Each value of x has been squared and summed. 


Sum of the x’s has been divided by N. Recall that the resultant is the 
variance. 


N 


(v 


(vi) Its square root has been found to obtain Standard Deviation. 
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Computation of Standard Deviation for Grouped Data 


Example : Calculate the standard deviation for the following distribution: 


The method of obtaining SD for grouped data has been explained in 
the table below. The initial steps upto column 4, are the same as those we 
followed in the computation of the mean for grouped data. We begin with 
assuming our mean to exist in the interval group of 150-160, hence a 
deviation value of zero has been assigned to the group. Likewise other 
deviations are determined. Values in column 4 (fx) are obtained by the 
multiplication of the values in the two previous columns. Values in column 
5 (fx 2) are obtained by multiplying the values given in column 3 and 4. 
Then various columns have been summed. 


120 - 130 
130 - 140 


140 - 150 
150 - 160 


160 - 170 


170 - 180 6 2 2 


22 3 
O e ok eed emn 


The following formula is used to calculate the Standard Deviation : 


SD = (lS fx”? — 2E 





Coefficient of Variation (CV) E F, faa DZ 
fi . 
f 


When the observations for different places or periods are expressed in differe ti ; t 
units of measurement and are to be compared, the coefficient of variations(CY)- $ 
proves very useful. CV expresses the standard deviation as a percent of 7 
the mean. It is determined using the following formula : 2 Pd) ot 'h 7 
ay 
\ A 


_ Standard Deviation 
Mean 


CV x 100 
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cv =2x100 
X 
The CV for the dataset given in Table 2.7 will, hence, be as under : 


cv = x100 
X 


cv === x100 


CV = 56% 


Coefficient of Variation for grouped data can also be calculated using the 
same formula. 


Rank Correlation 


The statistical methods discussed so far were concerned with the analysis of a 
single variable. We will now discuss the methods of exploring relationship between 
two variables and the way this relationship is expressed numerically. When dealing 
with two or more sets of data, curiosity arises for knowing whether or not changes 
in one variable produce changes in some other variable. 

Often our interest lies in knowing the nature of relationship or interdependence 
between two or more sets of data. It has been found that the correlation serves 
useful purpose. It is basically a measure of relationship between two or more 
sets of data. Since, we study the way they vary, we call these events variables. 
Thus, the term correlation refers to the nature and strength of 
correspondence or relationship between two variables. The terms nature 
and strength in the definition refer to the direction and degree of the variables 
with which they co-vary. 


Direction of Correlation 


It is our common experience that an input is made to get some output. There 
could be three possibilities. 
1. With the increase in input the output also increases. 
2. With the increase in the input the output decreases. 
3. Change in the input does not lead to change in the output. 
In the first case, the direction of the relationship between the input and output 
is in the same direction. It is called that both are positively correlated. 
In the second case the direction of change between the input and output is in 
the opposite direction and it is called that they are negatively correlated. 
In the third case, change in the input has no relationship with the output, 
hence, it is said that these do not have a statistically significant relationship. 
Let us now consider Fig. 2.7 which looks just opposite of Fig. 2.6. The plotted 
values run from the upper left to the lower right of the graph. Notice that for 
every increase of one unit on the X-axis, there is a corresponding decrease of two 
units on the Y-axis. It is an example of a negative correlation. It means that the 
two variables have a tendency to move opposite to each other, i.e. if one variable 
increases, the other decreases and vice versa. We can find such relationships 
existing between various geographical pairs of variables. Correlations between 
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height above sea level and air pressure, temperature and air pressure are a few 
examples. It implies that the obtained figure of correlation must precede with the 
arithmetical sign (plus or minus), more importantly in the negative correlation. 





X X 
1 2 3 4 5 6 7 8 9 10 O 1 2 3 4 5 6 7 8 9 10 
Fig. 2.6 : Perfect Positive Fig. 2.7 : Perfect Negative 
Correlation Correlation 
Degree of Correlation 


When reference has been made about the direction of correlation, negative or 
positive, a natural curiosity arises to know the degree of correspondence or 
association of the two variables. The maximum degree of correspondence or 
relationship goes upto 1 (one) in mathematical terms. On adding an element of 
the direction of correlation, it spreads to the maximum extent of -1 to +1 
through zero. It can never be more than one. The spread can also be translated 
into linear shape, as shown in the Fig. 2.8. Correlation of 1 is known as perfect 
correlation (whether positive and negative). Between the two points of divergent, 
perfect correlations lies O (zero) correlation, a point of no correlation or absence 
of any correlation between the variables. 


«—— Increasing Correlation r Increasing Correlation ——> 
-1 +1 


| STRONG MEDIUM WEAK | WEAK MEDIUM STRONG | 
(Or Moderate) (Or Moderate) 


Perfect Negative Perfect Positive 


Correlation Correlation 


No Correlation 





Fig. 2.8 : Spread of Direction and Degree of Correlation 


tat | 





Perfect Correlations as a 


Figs. 2.6 and 2.7 have been constructed to show the typical relationship bea 

two variables. Notice that these graphs show the scattering of X — Y va u ot ~ 

Therefore, such graphs are referred to as scatter gram or scatter plot. It may F A j h ; 

be noted from Fig. 2.6, that the pairs of values like these, when plotted, fall along ¢ PA ~ am yi 

a straight line and when this straight line runs from the lower left of the scatter "ye" a N 

plot to the upper right, it is an example of a perfect positive correlation (1-90)-~ tn Ge: ‘7 * 7 
a ` 


v“ 
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Fig. 2.7 is just opposite of this. All the points again fall along a straight line 
which now runs from the upper left-hand part of the scatter gram to its lower 
right. It is an example of a perfect negative correlation (with a value of- 1.00). 
No Correlation (or Zero Correlation) is one when any of the variables in the pair 
does not respond to the changes in the other, the correlation will come to zero. 
This is the state of no correlation or zero correlation. This is shown in Fig. 2.9. 
Scatter plot A shows no correlation when Y does not respond to changes in X. 
Similarly, zero correlation occurs in Seatter plot B when X does not respond to 
changes in Y. 


9 
8 
7 
6 
5 
4 
3 
2 
1 
O 


oMN AaB wODNY PM 
©OMONDOAA WN SY 


X 
2N3 4 wW 7 8 9 10 


A A Aà AÀ Aà A AÀ A A A ee 
e e Ae B A e A e e e y 


pd 
© 
bd 
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Fig. 2.9 : Scatter plot showing No Correlation 


Other Correlations 


Between the perfect correlations (+1) and zero correlation lies generalised 
conditions popularly referred to as weak, moderate and strong correlations. These 
conditions are clearly exhibited in Figs. 2.10, 2.11 and 2.12 respectively. Notice 
the spreading or the scattering of the plotted points and the assignment of the 
terms weak, medium and strong to them (generalised terms having no specific 
limits). Larger is the scattering, weaker is the correlation. Smaller is the scattering, 
stronger is the correlation, and when the plotted points fall on a straight line, the 
correlation is perfect (Fig. 2.6 and 2.7). 





Fig. 2.10 : Weak Fig. 2.11 : Moderate Fig. 2.12 : Strong 
Negative Correlation Positive Correlation Positive Correlation 
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Methods of Calculating Correlation 


There are various methods by which correlation can be calculated. However, 
under the constraints of time and space, we will discuss the Spearman’s Rank 
Correlation method only. 


Spearman’s Rank Correlation 
Spearman devised a method of computing correlation with the help of ranks. 
The method is popularly known as Spearman’s Rank Correlation symbolised 
asr p (the Greek letter rho). Spearman’s Rank Correlation method is widely 
used. The computation of the correlation is undertaken in the steps given 
below: 

(i) Copy the data related to X-Y variables given in the exercise and put 

them in the first and second columns of the table. 

(ii) Both the variables are to be ranked separately. The ranks of X-variable 
are to be recorded in column 3 headed by XR (ranks of X). Similarly, 
the ranks of Y-variable (YR) are to be recorded in the fourth column. 
The highest value in the data is to be awarded rank one, second highest 
rank two and so on. Suppose the data for X-variable are 4, 8, 2, 10, 1, 
9, 7, 3, O and 5, the XR will be 6, 3, 8, 1,9, 2,4, 7, 10 and 5 respectively. 
Notice that the last rank (10 in this case) equals the number of 
observations. Assignment of YR is also done in the same way. 

(iii) Now since both XR and YR have been obtained, find the difference 
between the two sets of ranks (disregarding the sign plus or minus) and 
record it in the fifth column. The sign of the difference is of no 
importance, since, these differences are squared in the next operation. 

(iv) Each of these differences is squared and sum of this column of squares 
is obtained. These values are placed in the sixth column. 

(v) Then the computation of the rank correlation is done by the application 
of the following equation: 


6) D’ 


=i 
P N(N? —1) 


Where, 
p = rank correlation 
` D’ = sum of the squares of the differences between two sets of ranks 


N = the number of pairs of X-Y 


Example 2.9: Calculate Spearman’s Rank Correlation with the help of e 


following data : 


Scores in Economics (X): | 02 08 OO 20 12 16 06 18 O9 10 


Scores in Geography (Y) :| 04 12 06 24 16 18 08 20 09 10 
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Table 2.8 : Computation of Spearman’s Rank Correlation 


y = 


[seses yo 


9 
T 
O 
l 
4 
3 
8 
2 
6 
D 





Calculation: 


Where, p is Rank Correlation; D is difference between the rank of X and Y; and 
N is number of items of x- y 


6X D’ 
pE N(N? —1) 
6x8 
10(10° —1) 


_;___ 48 
= 10(100 —1) 


48 
10(99) 





48 


(990) 


=1—0.05 
= 0.95 


In rho, we obtain a correlation, which makes a good substitute for other 
types of correlations, when the number of cases is small. It is almost useless 
when N is large, because by the time all the data are ranked, other type of 
correlation could have been calculated. 
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Excercises M4 a 
4 
1. Choose the correct answer from the four alternatives given below: d 
(i) The measure of central tendency that does not get affected by extreme values: f? , 
(a) Mean (b) Mean and Mode a | £ 
(c) Mode (d) Median 
(ii) The measure of central tendency always coinciding with the hump of any R Tes ; 
distribution is: 
(a) Median (b) Median and Mode 
(c) Mean (d) Mode Ls NE 
(iii) A scatter plot represents negative correlation if the plotted values run from: f “ga? L” 
(a) Upper left to lower right (b) Lower left to upper right Li 
(c) Left to right (d) Upper right to lower left p~n. 
2. Answer the following questions in about 30 words: è LX ba a 
(i) Define the mean. 2 1). Y= 
(ii) What are the advantages of using mode ? 1 pa , 
(iii) What is dispersion ? ) ~ 
(iv) Define correlation. r+ h 
(v) What is perfect correlation ? P- t ( t \ 
(vi) What is the maximum extent of correlation? Pe ; 









3. Answer the following questions in about 125 words: 
(i) Explain relative positions of mean, median and mode in a normal 


distribution and skewed distribution with the help of diagrams. \ SA ~ 
(ii) Comment on the applicability of mean, median and mode (hint: from their hd # 
` 
merits and demerits). f~ f <7 


(iii) Explain the process of computing Standard Deviation with the help of an 
imaginary example. 


(iv) Which measure of dispersion is the most unstable statistic and why? { at f al ee! 
(v) Write a detailed note on the degree of correlation. oy x 
(vi) What are various steps for the calculation of rank order correlation? £ LA 

H Å 


Activity 


1. Take an imaginary example applicable to geographical analysis and explain 
direct and indirect methods of calculating mean from ungrouped data. 


ZUISS220 


2. Draw scatter plots showing different types of perfect correlations. 


2020-21 


